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1 Introduction 

In the last decade, algebraic methods have led to much progress in classifying 
the complexity of the non-uniform Constraint Satisfaction Problem (CSP). 
The programming language Datalog, whose origins lie in logic programming 
and database theory, has been playing an important role in describing the 
complexity of CSP since at least the classic paper of Feder and M. Vardi HH, 
where Feder and Vardi used Datalog to dehne CSPs of bounded width. In 
an effort to describe the hner hierarchy of CSP complexity, V. Dalmau jH] 
asked which CSPs can be solved using the weaker language of linear Datalog, 
and later L. Egri, B. Larose and P. Tesson [10] introduced the even weaker 
symmetric Datalog. 

We want to show that if CSP (A) can be solved by a linear Datalog pro¬ 
gram (alternatively, has bounded pathwidth duality) and A is n-permutable 
for some n, then CSP (A) can be solved by a symmetric Datalog program (and 
so lies in L). While this yields an “if and only if” description of symmet¬ 
ric Datalog, it is not a perfect characterization - describing the structures A 
such that CSP (A) is solvable by linear Datalog is an open problem. However, 
once CSPs for which linear Datalog works are classihed, we will immediately 
get an equally good classihcation of symmetric Datalog CSPs. 

In particular, should it turn out that SD(V) implies bounded pathwidth 
duality, we would have a neat characterization of problems solvable by sym¬ 
metric Datalog: It is the class of problems whose algebras omit all tame 
congruence theory types except for the Boolean type (we go into greater 
detail about this in the Conclusions). 
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Our result is similar to, but incomparable with, what V. Dalmau and B. 
Larose have shown [S]: Their proof shows that 2-permutability plus being 
solvable by Datalog implies solvability by symmetric Datalog. We require 
both less (u-permutability for some n as opposed to 2-permutability) and 
more (linear Datalog solves CSP(A) as opposed to Datalog solves CSP(A)). 

Our proof strategy is this: First we show in Section [3] how we can use 
symmetric Datalog to derive new instances from the given instance. Basi¬ 
cally, we show that we can run a smaller symmetric Datalog program from 
inside another. This will later help us to reduce “bad” CSP instances to a 
form that is easy to deal with. Then, in Section H] we introduce path CSP 
instances and show how n-permutability restricts the kind of path instances 
we can encounter. We use this knowledge in Section [3] to show that for any 
n-permutable A, there is a symmetric Datalog program that decides path 
instances of CSP (A). Finally, in Section [7] we use linear Datalog to go from 
solving path instances to solving general CSP instances and hnish our proof. 

When writing this paper, we were mainly interested in ease of exposition, 
not in obtaining the fastest possible algorithm. We should therefore warn 
any readers hoping to implement our method in practice that the size of 
our symmetric Datalog program grows quite quickly with the size of A and 
the number of Hagemann-Mitschke terms involved. The main culprit is the 
Ramsey theory argument in Lemma [151 

2 Preliminaries 

All numbers in this paper are integers (most of them positive). If n is a pos¬ 
itive integer and a, b are integers, we will use the notation [n] = {1, 2,..., n} 
and the notation [a,b] = {i ^ Z: a < i < b} (and variants such as [a,b) = 
{i E Z\ a < i < b}). 

We will be talking quite a bit about tuples - either tuples of elements 
of A or tuples of variables. We will treat both cases similarly: An n-tuple 
on y is a mapping a: [n] —)■ Y. We will denote the length of the tuple a 
by |cr|, while Imcx will be the set of elements used in a. Note that if e.g. 
a = {x, X, y), we can have |(j| > | Imal. 

A relation on A is any R C A^ where X is some (hnite) set. The arity 
of R is the cardinality of X. Most of the time, we will use X = [n] for some 
n G N and write simply RYA^. 

When i? C A" is an n-ary relation and cr = (oi,..., a„) is an n-tuple. 
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we will often write -R(cr) instead of (ai,...,a„) E R. Given a mapping 
f ■. A ^ B and an n-tuple a E A^, we will denote by f(a) the n-tuple 
(/(a(l)),...,/(cT(n))) e 

A relational structure consists of a set A together with a family TZ of 
relations on A, which we call basic relations of A. In this paper, we will only 
consider hnite relational structures with hnitely many basic relations. We 
will not allow nullary relations or relations of inhnite arity. 

An n-ary operation on A is any mapping t: A^ —)■ A. We say that 
an n-ary operation t preserves the relation R if for all ri,... ,r„ G R we 
have f(ri,r 2 ,... ,r„) G R (note that f(ri,... ,r„) is a tuple here). Given a 
relational structure A, an n-ary operation t on A is a polymorphism of A if 
t preserves all basic relations of A. 

The algebra of polymorphisms of A is the algebra with universe A and 
the set of operations consisting of all polymorphisms of A. We will use the 
shorthand A for this algebra. 

The relational clone of A is the set of all relations on A that can be dehned 
from the basic relations of A by primitive positive dehnitions - formulas that 
only use conjunction, existential quantihcation and symbols for variables. We 
will sometimes call members of the relational clone of A admissible relations 
of A. The importance of relational clone comes from the fact that A preserves 
precisely all relations on A that belong in the relational clone of A mm- 

Let us £x a relational structure A = (A, TV) and dehne the non-uniform 
Constraint Satisfaction Problem with the right side A, or GSP(A) for short. 
This problem can be stated in several mostly equivalent ways (in particular, 
many people prefer to think of GSP(A) as a question about homomorphisms 
between relational structures). We dehne GSP(A) in the language of logical 
formulas. 

Definition 1. An instance I = {V,C) of GSP(A) consists of a set of variables 
V and a set of constraints C. Each constraint is a pair {a, R) where a G G” is 
the scope of the constraint and R E TZ is the constraint relation. A solution 
of / is a mapping f: V ^ A such that for all constraints {a, R) E C we have 
f{a) E R. 

If I is an instance, we will say that I is satisfiable if there exists a solution 
of / and unsatisfiable otherwise. The Gonstraint Satisfaction Problem with 
target structure A has as its input an instance / of GSP(A) (encoded in a 
straightforward way as a list of constraints), and the output is the answer to 
the question “Is I satishable?” 
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Figure 1: An example of microstructure with six variables xi, X 2 ,..., xe and 
five binary relations (instance solution in bold). 

If / = (V,C) is an instance of CSP(A), then any CSP(A) instance J = 
{U, V) with U OV and D C C is called a subinstance of I. It easy to see that 
if I has an unsatishable subinstance then I itself is unsatishable. li U ^ V, 
the subinstance of / = {V, C) induced by U is the instance I\u = {U, P’) where 
{a, R) G V ii and only if Im a C U. 

We can draw CSP instances whose constraints’ arities are at most as mi- 
crostructures (also known as potato diagrams among universal algebraists): 
For each variable x we draw the set ^ A equal to the intersection of all 
unary constraints on x. For each binary constraint we draw lines joining the 
pairs of elements in corresponding sets. A solution of the instance corre¬ 
sponds to the selection of one element in each set in such a way that 
whenever C = {{x,y),R) is a constraint, we have {b^,by) G R (see Figured] 
for an example). 

Obviously, CSP (A) is always in the class NP, since we can check in poly¬ 
nomial time whether a mapping f:V —)■ A is a solution. Had we let the 
structure A be a part of the input, the constraint satisfaction problem would 
be NP-complete (it is easy to encode, say, 3-colorability of a graph as a CSP 
instance). However, when one fixes the structure A, CSP (A) can become 
easier. In particular, for some relational structures A, the problem CSP (A) 
can be solved by Datalog or one of its variants. 

2.1 Datalog 

The Datalog language offers a way to check the local consistency of CSP 
instances. A Datalog program P for solving CSP (A) consists of a list of rules 
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of the form 


R{p) ^ S'i((Ti), 6'2 (o-2)5 • • •, 

where R, Si,..., Si are predicates and p, ai,..., are sequences of variables 
(we will denote the set of all variables used in the program by X). Some 
predicates of P are designated as goal predicates (more on those later). 

In general, the predicates can be symbols without any meaning, but in 
the programs we are about to construct each predicate will correspond to a 
relation on A, i.e. a predicate S'(a;i,a; 2 ) would correspond to some S C A^. 
This will often get us in a situation where, say, the symbol R stands at the 
same time for a relation on A, a predicate of a Datalog program, and a 
relation on the set V of variables (see below). For the most part, we will 
depend on context to tell these meanings of R apart, but if there is a risk of 
confusion we will employ the notation R^ for R^ C A"^, R^ for predicates of 
P, and R^ for R^ CV^. 

Given a Datalog program P that contains predicates for all basic relations 
of a relational structure A, we can run P on an instance I = (V, C) of CSP(A) 
as follows: For each n-ary predicate R^ of P, we keep in memory an n-ary 
relation R'^ C V^. Initially, all such relations are empty. To load I into the 
program, we go through C and for every (P^, a) G C, we add a to R^ (when 
designing P, we will always make sure that there is a predicate R^ for each 
basic relation R"^ of A). 

After this initialization, P keeps adding tuples of V into relations R'^ as 
per the rules of P: If we can assign values to variables so that the right hand 
side of some rule holds, then we put the corresponding tuple into the left 
hand side relation R. 

More formally, we say that P(/) derives R^{p) for p G D”, writing 
P(/) h P'^(p), if one of the following happens: We have {p,R^) G C, or 
P contains a rule of the form 

P^(r)^Pf(ai), Pf(a2), ...,Pf(a,), 

where r, cti, ..., cr^ are tuples of variables from the set of variables X, and 
there exists a mapping (evaluation) u: X ^ V such that a;(r) = p and for 
each z = 1,..., £ we have P(/) h SY (a;((Ti)). 

If a Datalog program P ever uses a rule with a goal predicate on its left 
side, then the program outputs “Yes,” and halts. We will use the symbol 
G to stand for any of the goal predicates, writing for example P(/) h G as 
a shorthand for “P run on / derives a relation that is designated as a goal 
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predicate.” Another way to implement goal predicates, used e.g. in m , is 
to introduce a special nullary relation G that is the goal. We don’t want to 
deal with nullary relations, but the distinction is purely a formal one: Should 
the reader want a program with a nullary G, all that is needed is to simply 
introduce rules of the form G <(— R{xi,X 2 , ■ ■ ■) where R ranges over the list 
of goal predicates. 

If a goal predicate is not reached, the program P{I) runs until it can 
not derive any new statements, at which point it outputs “No,” and halts. 
Thanks to the monotonous character of Datalog rules (we only add tuples to 
predicates, never remove them), any given Datalog program can be evaluated 
in time polynomial in the size of its input instance /. 

Given a Datalog program P and a relational structure A we say that P 
decides CSP(A) if P run on an CSP(A) instance / reaches a goal predicate 
if and only if I is unsatishable. We say that CSP(A) can be solved by 
Datalog if there is a Datalog program P that decides CSP(A). (Strictly 
speaking, we should say that P decides CSP(A) in this situation, but that 
is cumbersome.) 

For Ri,..., Rk relations and ai,... ,ak tuples of variables, we dehne the 
conjunction Ri{ai) A • • ■ A Rk{crk) as a relation (resp. predicate) on IJ^ Im ctj. 
For example, Ri{xz,X 2 ,X 2 ) A i? 2 (a^ 3 ,a^ 4 ) is a relation of arity 3 on the three 
variables X 2 ,X 3 ,X 4 . 

To slim down our notation, we will for the most part not distinguish the 
abstract statement of a Datalog rule (with variables from X) and the concrete 
realization of the rule (with the evaluation u: X ^ V). For example, if P 
contained this rule a: 


R{x,z) ^ S{x,y), T{y,z) 

and it happened that P{I) h *S(1, 2) and P{I) hT'(2, 2), then instead of saying 
that we are applying the rule a with the evaluation u{x) = 1, u{y) = u{z) = 
2 to add (1,2) into R, we would simply state that we are using the rule 

/?(1,2)^F(1,2),T(2,2), 

even though that means silently identifying y and z in the original rule a. 

The power of Datalog for CSP is exactly the same as that of local consis¬ 
tency methods. L. Barto and M. Kozik have given several different natural 
characterizations of structures A such that Datalog solves CSP (A) |1]. How¬ 
ever, this is not the end of the story, for there are natural fragments of 
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Datalog which have lower expressive power, but also lower computational 
complexity. 

Predicates that can appear on the left hand side of some rule (and there¬ 
fore can have new tuples added into them) are called intensional database 
symbols (IDB). Having IDEs on the right hand side of rules enables recur¬ 
sion. Therefore, limiting the occasions when IDEs appear on the right hand 
side of rules results in fragments of Datalog that can be evaluated faster. 

An extreme case of such restriction happens when there is never an IDE 
on the right hand side of any rule. It is easy to see that such Datalog programs 
can solve CSP(A) if and only if A has a hnite (also called “hnitary”) duality. 
This property is equivalent to CSP(A) being dehnable in hrst order logic 
by i (see also the survey m)- Structures of hnite duality are both well 
understood and rare, so let us look at more permissive restrictions. 

A Datalog program is linear if there is at most one IDB on the right hand 
side of any rule. When evaluating Linear Datalog programs, we need to only 
consider chains of rules that don’t branch: It is straightforward to show by 
induction that if P is a linear Datalog program and I is an instance of the 
corresponding CSP, then P{I) h R{p) if and only if (p, R) is a constraint of 
I or there is a sequence of statements 


Ul{ipi),U2{‘P2), • • • , f4n((Pm) = R{,p) 
such that for each i = 2,... ,m the program P has a rule of the form 

u.(<p,) ^ TlOl), .... TpriJ, 

where Ui-i is the IDB in the rule and {rj,Tj) are constraints of I for all 
j = 1,..., A- The hrst statement, [/i(cri), is a special case as P must derive 
it without using IDEs, i.e. there is a rule of P of the form 

where all (r/.T/) , • • • , (a\ , r/J are constraints of I. 

(Note that this is the hrst time we are using the “concrete realization of 
the abstract rule” shorthand.) We will call such a sequence Ui{(pi ),..., Um{‘Pm) 
a derivation of R{p). 

Another way to view the computation of a linear Datalog program is to 
use the digraph Q{P,I): The set of vertieces of Q{P,I) will consist of all 
pairs (p, R) where R is an n-ary IDB predicate of P and p G V^. The graph 
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Q{P,I) contains the edge from {p,R) to (cr, S') if P contains a rule of the 
form 

R{p)^S{a),T,{n),...,Tk{Tk), 

where all {Ti,Ti) are constraints of /. 

It is easy to see that P{I) h G if and only if there is a tuple p and an IDB 
R such that P{I) R{p) in one step, without the use of intermediate IDEs, 
and there is a directed path from (p, R) to a goal predicate in Q{P, I). It is 
straighforward to verify that deciding the existence of such a path is in NL. (In 
fact, deciding directed connectivity is NL-complete [H Theorem 4.18, p. 89] 
and since there is a linear Datalog program that decides directed connectivity, 
it follows that evaluating linear Datalog programs is NL-complete.) 

The exact characterization of structures A such that there is a linear Dat¬ 
alog program deciding CSP(A) is open. A popular conjecture is that CSP(A) 
can be solved by linear Datalog if and only if the algebra of polymorphisms 
of A is semidistributive. Barto, Kozik and Willard have proved that if A 
admits an NU polymorphism then CSP(A) can be solved by linear Datalog 
(see 0). 

For our purposes, it will be useful to notice that CSP(A) can be solved 
by a linear Datalog program if and only if A has bounded pathwidth duality. 

Definition 2. CSP(A) instance / = (V,C) has pathwidth at most k if we 
can cover V hy a family of sets Ui, , Um such that 

• \Ui\ <k + 1 ioT each i, 

• ii i < j and v E V lies in Ui and Uj, then v also lies in each of 
Ui+i, ..., Uj_i, and 

• for each constraint C E C there is an i such that the image of the scope 
of C lies whole in U. 

The name pathwidth comes from the fact that if we arrange the variables 
in the order they appear in f/i,..., Um and look at the instance from far away, 
the “bubbles” Ui, ..., Um form a path. The length of the path is allowed to 
be arbitrary, but the “width” (size of the bubbles and their overlaps) is 
bounded. 

We say that A has bounded pathwidth duality if there exists a constant 
k such that for every unsatishable instance / of CSP(A) there exists an un- 
satishable instance J of CSP(A) of pathwidth at most k such that we can 


identify some variables of J to obtain a subinstance of I. (This is a transla¬ 
tion of the usual definition of duality, which talks about homomorphisms of 
relational structures, to CSP instances.) 

Proposition 3 ([S])- Assume that A is a relational structure. Then A has 
hounded pathwidth duality if and only if there exists a linear Datalog program 
deciding CSP (A). 

Symmetric Datalog is a more restricted version of linear Datalog, where 
we only allow symmetric linear rules: Any rule with no IDEs on the right 
hand side is automatically symmetric, so the interesting case is when a rule 
a has the form 

Rip) ^ S{a) AT,in) AT^ir^) A ..., 

where R, S are (the only) IDEs. If a symmetric program P contains the rule 
a, then a P must also contain the rule a' obtained from a by switching R{p) 
and S{a) (we will call this rule the mirror image of a): 

Sia) ^ Rip) ATiin) AT2iT2) A ... 

Observe that if P is a symmetric Datalog program, then QiP,I) is al¬ 
ways a symmetric graph. Therefore, deciding if P(/) h G is equivalent to an 
undirected reachability problem. Evaluating symmetric Datalog programs is 
thus in L thanks to Reingold’s celebrated result that undirected reachability 

is in L [T7] . 

We will often use Datalog programs whose predicates correspond to re¬ 
lations on A. However, in doing so we will not restrict ourselves to just 
the relations from the relational clone of A. If the predicates R, Si,..., Si 
correspond to relations on A, then we say that the rule 


R^ip)^S[iai),S^ia2),...,S[ia,), 


is consistent with A if the corresponding implication holds for all tuples of 
A, i.e. the sentence 

V/ : A' ^ A, R''(f(p)) ^ (Si''(/(tTi)) A S^(f(a2)) A • ■ ■ A Spf(a,))) , 

holds in the relational structure A (recall that X is the list of all variables used 
in the rules of P). In other words, a consistent rule records an implication 
that is true in A. 
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For r G M, we construct the r-ary maximal symmetric Datalog program 
consistent with A, denoted by as follows: The program has as predicates 
all relations of arity at most r on A (these will be IDEs), plus a new symbol 
for each basic relation of A of arity at most r (these symbols will correspond 
to the relations used in constraints and they will never be IDEs; thus we 
have two symbols for each basic relation of A, only one of which can be on 
the left hand side of any rule). 

The set of rules of will contain all rules a that 

1. are valid linear Datalog rules (i.e. an IDE on the left side, at most one 
IDE on the right), 

2. use only tuples of variables from X = {xi,... ,Xr} (i.e. at most r 
variables at once), 

3. don’t have any repetition on the right hand side, i.e. each statement 
P(cr) appears in a at most once (however, the predicate R can be used 
several times with different tuples of variables), 

4. are consistent with A, and 

5. if a contains an IDE on the right hand side, then the mirror image a' 
of a is also consistent with A. 

We will designate all empty relations of arity at most r as goal predicates. 
We note that our P^ is a variation of the notion of a canonical symmetric 
Datalog program (used e.g. in [9]). 

It is an easy exercise to show that Pa(/) hS'(cr) if and only if Q{V\,I) 
contains a path from (p. A) to {a, S) where A is the unary full relation on A 
and p is arbitrary. Starting with the full relation will help us simplify proofs 
by induction later. 

The set of rules of P^ is large but hnite because there are only so many 
ways to choose a sequence of at most r-ary predicates on r variables without 
repetition. Since A and r are not part of the input of CSP(A), we don’t mind 
that P^ contains numerous duplicite or useless rules. 

When we run P^ on a CSP(A) instance /, it attempts to narrow down 
the set of images of r-tuples of variables using consistency: 

Observation 4. Let A be a relational structure, r G N, and I = (D, C) an 
mstance o/CSP(A). Then: 
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1. if C p e V"' are such that 'r {p), then any solution f 

of I must satisfy f{p) G 

2. ifV^^^I) \-G, then I is not satisfiable. 

Proof. To prove the first claim, consider a path in Q{P,I) that witnesses 

F(/)hi?(p): 

(pi, Fi), (p2, *^ 2 ), • • • , (Pm, Sn,) = (p, R). 

with = A. 

We claim that if / is a solntion of I, then for each i = 1,..., m we must 
have f{pi) G S^. We proceed by induction. For i = I, this is trivial. 

Assume now that /(pi) G St and that contains a rule a of the form 

*51+1 (pi+i) <— Si{pi), Ti(ri), ..., Tkijk), 

where (r,-, Tj) G C for j = 1,..., fc. Since Tjijj) are constraints of /, we have 
/(tj) G Tj for each j. From the fact that a is a rule consistent with A, it 
follows that /(pi+i) G Fj+i. 

The second statement of the Lemma is a consequence of the hrst, since 
reaching a goal predicate means that V\{I) h0(p) for some p tuple of vari¬ 
ables in V. Using ([T]), we get that each solution of I must satisfy the impos¬ 
sible condition /(p) G 0 and so there can’t be any solution /. □ 

By Observation m the only way can fail to decide CSP(A) is if there is 
an unsatishable instance / of CSP(A) for which F^ does not derive G. Our 
goal in the rest of the paper is to show that for r, s large enough and A nice 
enough such a situation will not happen. 

Let us £x a positive integer n. We say that a variety V is (congruence) 
n-permutable if for any algebra A in U and any pair of congurences a, /3 G 
Con A it is true that 

a\J (3 = a o j3 o a o ^ 

with n — 1 composition symbols on the right side (in particular, 2-permutable 
is the same thing as congruence permutable). 

A standard free algebra argument gives us that V is n-permutable if and 
only if we can hnd idempotent terms Po,Pi, ■ ■ ■ ,Pn such that 

X = po(x,y,z), 

Pi(x,x,y) = pi+i(x,y,y) for all i = 1,2,... ,n - 1, 

Pn{x,y,z) = z. 
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The above terms are called Hagemann-Mitschke terms and were first ob¬ 
tained in [T3] . 

If the algebra of polymorphisms of a relational structure A generates 
an n-permutable variety, i.e. if there are Hagemann-Mitschke operations 
• • • ,Pn in A, we say simply that A is n-permutable. 

Let us close this section by talking about necessary conditions for CSP(A) 
to be solvable by symmetric Datalog. An obvious condition is that, since 
symmetric Datalog is a subset of linear Datalog, CSP(A) must be solvable 
by linear Datalog. 

A relational structure A is a core if any unary polymorphism /: A —)• A is 
an automorphism (i.e. we can’t retract A to a smaller relational structure). 
To classify the complexity of CSP(A) for A hnite, it is enough to classify 
cores, see [161 P- 142]. 

However, if A is a core, then for CSP(A) to be solvable by symmetric 
Datalog, A must omit all tame congruence theory types except for type 3 
(Boolean) [151 Theorem 4.2], from which it follows [H] that A must be n- 
permutable for some n. 

Proposition 5. If A is a core relational structure such that CSP(A) is solv¬ 
able by symmetric Datalog, then A is n-permutable for some n and CSP(A) 
is solvable by linear Datalog. 

Our goal in this paper is to prove that the conditions of Proposition [5] are 
also sufficient: 

Theorem 6. Let A be a relational structure such that there is a linear Datalog 
program that decides CSP(A) and A admits a chain ofn Hagemann-Mitschke 
terms as polymorphisms. Then there exists an r ^ N such that decides 
CSP(A). 

3 Stacking symmetric Datalog programs 

In this section we describe two tricks that allow us essentially to run one 
Datalog program from inside another (at the cost of decreasing the number 
of variables available to the program). 

The hrst lemma of this section is basically [H Lemma 11] rewritten in our 
formalism: 
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Lemma 7 (V. Dalmau, B. Larose). Let K he a relational structure, I = {V,C) 
an instance o/CSP(A), let S C A^, R be two relations, and let a E 
and p E . Assume that ^^^{1)1- S(a). 

Then for any k > r + s we have 

ri{I)h R{p) ^Vi{I)h R{p) A S{a). 

Proof. Let 

be a path in Q{I, V]f) witnessing Pa(/) h S'(cr). Then it is easy to verify that 

i?(p), R{p) A Ui{ipx),R{p) A U 2 {^ 2 ), • • •, R{p) A = R{.p) A S'(cr) 

is a path in the graph 0(1, Ra)- Therefore, V\{I) derives R{p) if an only if 
it derives R{p) A S{a). □ 

Repeated use of Lemma [7] gets us the following: 

Corollary 8. Let A be a relational structure, I a CSP(A) instance. Let 
Si,..., Sj and R be relations on A and ai,... ,(jp, p be tuples of variables 
from I. 

If 'Pa(-^) ^ j = 1; ■ ■ ■ 0 ,'iT'd both |p|, I Imp U lJi=i *^*1 

most r, then we have: 

iPr(J) hR(p) aaVT{I)SR{p) a 

R{p) A Siiai) A S2ia2) 

AAVI+^{I)SR{p) a Siiai) A • • • A Sp{ap) 

Definition 9. Given an instance / = (P,C) of CSP(A), we say that 
derives the instance J = {W,R) from /, writing J, if hP C P and 

for each {a, R) E V we have V''a{I) h R{a). 

Obviously, if derives an unsatishable instance from J, then I iself is 
unsatishable. Moreover, a maximal symmetric Datalog program run on / 
can simulate the run of a smaller maximal symmetric Datalog program on 
J: 
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Lemma 10. Let A = (A, Ri,..., i?„) and B = (A, Si,, Sm) be two rela¬ 
tional structures and let I = (y,C) be an instance o/CSP(A). Assume that 
r, s are positive integers and J = {W,V) is an instance o/CSP(B) such that 
Vl{I) h J and Vl{J) h G. Then h G. 

Proof. The derivation of V'^^^{I)\-G will follow the derivation V'^{J)\-G, 
generating the constraints of J on the fly using P^(/). Note that since A 
and B share the same base set, the predicates of are also predicates of 

-pr+s 

Let Ui{lpi), 172 ( 9 ^ 2 ), • • •, Ug{ipg) be a derivation of G by such that 

Ui = A. 

We proceed by induction on i from 1 to g and show that P Ui{pi) 

for all i. Since all goal predicates of are also goal predicates of this 
will show that P^'^^(/) PG. The base case is easy: Since Ui is full, P^"^* has 
the rule “Pi((pi) ^ ”, giving us P;+^(/) P Pi((pi). 

Assume that P^’^®(/) P Ui{ipi). Since V^{J) derives Pj+i(</9j+i) from Ui{ipi), 
there have to be numbers ji,...,jp and tuples ai,...,ap such that each 
((jfc, Sj^.) is a constraint of J, and 

Ui-\-i{ipi-\-i} i Uiiifi), <Sj-^(cTi), Sj^i^crf), • • •) ^jpi^p) 

is a rule of P^. From this, it is easy to verify that the following rule, which 
we will call a, is a rule of 

iUi+i{(fi+i) A A • • • A Sjp{ap)) 

{Ui{(pi) A A • • • A 

Starting from Pa(7) P Ui{ipi), we use Corollary [8] to get P^'^^(/) P Ui{ipi) A 
Ak=i We then use the rule a to obtain P^'^^(/) P Pj+i((pj+i) A 

/\k=i and hnally use the other implication from Corollary [8] to get 

P Pj+i((pj+i), concluding the proof. □ 

At one point, we will need to look at powers of A. For this, we introduce 
the following notation: If 

= (( 51 , 1 , • • • , S/c,!), . . . , (S£^i, . . . , Si^k)) £ {A^Y 

is an Atuple of elements of A^ then by a we will mean the /cAtuple we get 
by “unpacking” a into A^^\ 

C’’ = (si,l, • • ■ , Sfc,!) ■ ■ ■ , S£,l, • • • , 


14 


If C {^A^Y is a relation on we will denote by U the relation U = 
{a: a G U} C A^^. 

The following lemma generalizes Lemma fTOl to powers of A. The proof is 
similar to that of Lemma [10] and we omit it for brevity. 

Lemma 11. Let k G N and assume we have relational structures A and B 
on the sets A and A^ respectively. Assume moreover that I = (V,C) is an 
instance o/CSP(A), Si,..., Sm are basic relations o/B, ai,... ,(Jm are tuples 
of elements ofV^, and r,s are positive integers such that: 

1. 'Pa(/) h Si(o^) for each i = 1,... ,m, 

2. V^{J) h G, where J is the instance J = {V^, {(cTj, Si) | z = 1,..., m}) 
o/CSP(B). 

Then P^+^"(/)hG'. 

4 n-permutability on path instances 

We begin our construction by showing how n-permutability limits the kind 
of CSP instances a symmetric Datalog program can encounter. 

Definition 12. An instance I = {y,C) of CSP is a path instance of length I 
if: 

(a) C is a linearly ordered set (we use V = [£] ordered by size whenever 
practicable, such as in the rest of this dehnition), 

(b) for each i ^V, I contains exactly one unary constraint with the scope i] 
we will denote its constraint relation by Bi C A, 

(c) for each i = 1,2,... ,£ — l, I contains exactly one binary constraint with 
the scope {i,i + 1); we denote its constraint relation 

(d) / contains no other constraints than the ones named above. 

Note that can contain tuples from outside of Bi x We allow 

that to happen to simplify our later arguments. 

If / is a path instance of length i and a < b are integers, we dehne the 
instance / restricted to [a, b] as the subinstance of / induced by all variables 
of I from the a-th to the 6-th (inclusive). We will denote / restricted to [a, b] 
by /[a,fel¬ 
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Figure 2: A sketch of a 4-braid. The solution t from Observation [TTI pictured 
as a zigzag. 

Definition 13. Let / be a path CSP instance on [£]. An n-braid (see Fig¬ 
ure [2]) in / is a collection of n-|-1 solutions Sq, Si, S 2 , ■■■, Sn of I together with 
indices 1 < ii < ■ ■ ■ < in < i such that for all /c = 1, 2 ,..., n — 1 we have 

1 . (z/j), and 

2. ) Sfc(zfc-i-i). 

When we want to explicitly describe a braid, we will often give the 2?7,-tuple 
(■^O) • • • ) ^ni f'li ■ ■ ■ 1 a)' 

We care about braids because it is easy to apply Hagemann-Mitschke 
terms to them to get new solutions of I. This observation is not new; one 
can hnd it formulated in a different language in [TSl Theorem 8.4.]: 

Observation 14 (R. Freese, M. Valeriote). Let n G N and let A be an n- 
permutable algebra, I be a linear instance o/CSP(A), and let (so,...,s„; 
ii,... ,in) be an n-braid in I. Then there exists a solution t of I such that 
t{k) = So(A) and t{in) = Sn{in)- 
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Proof. Since A is n-permutable, we have a chain of Hagemann-Mitschke 
terms Po,Pi, ■ ■ ■ ,Pn compatible with constraints of I. All we need to do is 
apply these terms on sq, si,..., s„. 

Denote by rk the mapping rk{i) = pk{sk-i{i), Sk{i), Sk+i{i)) where k goes 
from 1 to n — 1; we let To = sq and Vn = Sn- Since pk is a polymorphism, each 
Tk is a solution of I. Moreover, one can verify using the Hagemann-Mitschke 
equations together with the equalities from the dehnition of an n-braid that 
for each k = 1,... ,n we have rk-\{ik) = ^kiik)- 

Since / is a linear instance, we can glue the solutions tq, ... ,rn together: 
The mapping t dehned as t{i) = rk{i) whenever ik < i < 4+i (where we put 
z_i = 0 and z„+i = I for convenience) is a solution of I. To finish the proof, 
it remains to observe that t{ii) = So(h) and □ 

Let / be a path instance of CSP. We will say that a binary constraint 
of I is subdirect if Bi, Bi+i Bi<P 7ri(Hj_j+i), and Hj+i C 'K 2 {Bi^i+i)- 
(We have modified the standard dehnition of subdirectness a bit to account 
for the fact that j+i can contain tuples outside of B^ x Bi^i.) An instance 
is subdirect if all its constraints are subdirect. It is easy to verify that if J is a 
subdirect path instance and e G {Bi x then by walking from e 

backwards and forwards along the edges dehned by the binary constraints of 
I we get a solution s of / that contains the edge e, that is {s{i), s(f-|- 1)) = e. 

The following lemma tells us that if a path instance / is subdirect and 
we mark enough edges in /, we can hnd an n-braid that goes through many 
edges of our choosing. It is a Ramsey-like result and we prove it using the 
Ramsey theorem. 

Lemma 15. For every n and N there exists an m with the following property: 
Let be I a subdirect path CSP instance of length I such that \Bi\ < N for 
each z G [P\. Then for any choice of indices 1 < ji < j 2 < • • • < im < ^ and 
edges Ck G fl {Bj^^ x for k = 1,... ,m, there exists an n-braid 

(so,..., s„; R,..., in) in I such that for every k = 1, 2,..., n — 1 there is a 
q so that ik < jq < 4+i and {sk{jq),Sk{jq + 1)) = Cq (that is, between every 
pair of “crossings” is an edge Cq; see Figure\^. 

Proof. Without loss of generality we can assume that B^ C [A^] for each i. 
For each A; = 1,..., m, we choose and hx a solution cr^ of / that contains the 
edge Ck (which we get from subdirectness of /; see above). 

Consider now the complete graph G with vertex set [m] whose edges are 
colored as follows: For every u < v we color the edge {u, v} G (™) by the pair 
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Figure 3: The conclusion of Lemma [TH] for n = 3. Important edges e* drawn 
in bold. 

of numbers {cru{j^), ay{ju)) G [^]^- By the Ramsey theorem, if m is large 
enough then there exists a monochromatic induced subgraph of G on 2 n + 1 
vertices. To make our notation simpler, we will assume that these vertices 
are 1 , 2 ,..., 2 n + 1 . 

Thanks to edges of G being monochromatic on [2n + 1], we have that cr^ 
and au' agree on as long as u, u', v G [2??, +1] and either u, u' < v, or u, u' > 
V. Using this, we can easily verify that (ui, a^,..., cr 2 n+u j 2 , jij ■ ■ ■ > j 2 n) is an 
n-braid. For each A: = 1,2,..., n — 1 we get: 

0'2k+l{j2k) = 0'2k+3U2k) 

(^2k+l{j2k+2) = 0'2k-lU2k+2) 

To hnish the proof, observe that for every A; G [n — 1] we have j 2 k < j 2 k+i < 
j 2 k +2 and the solution a 2 k+i was chosen so that it passes through 62 ^+ 1 , so 
we can let g = 2A; + 1 and satisfy the conclusion of the lemma. □ 

Given a path instance J, we will dehne the sets Ci < Bi by Ci = Bi and 
Ci+i = {fe G i?i+i: 3c G Ci, (c, b) G 

The sets Ci correspond to the endpoints of solutions of I[i^i], so / is sat- 
ishable if and only if Ci 7 ^ 0. We will call an edge ( 6 , c) G such that 

b E Bi\Ci and c G Ci+i a backward edge. 

Our goal in Section | 6 ] will be to show how to use symmetric Datalog to 
identify unsatishable path CSP(A) instances for A hxed and n-permutable. 
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We will see that in the absence of backward edges a simple symmetric Data- 
log program can identify all unsatishable path CSP instances. This is why we 
want to know what happens when there are many backward edges. It turns 
out that having too many backward edges means that an n-permutable in¬ 
stance is no longer subdirect. In Section |6l this will enable us to reduce the 
size of the instance. 

Lemma 16. For every n and N there exists an m such that if I is a path 
instance of length i and 1 < a < b < £ are such that 

1. each set Bi has cardinality at most N, and 

2. all sets Bi and all relations are invariant under a chain of n 

Hagemann-Mitschke terms, and 

3. there are at least m distinct indices j in [a, b) such that Bjj+i contains 
a backward edge, 

then the instance I[a,b] is not subdirect. 

Proof. We pick m large enough to be able to use Lemma [15] for sets Bi of 
maximum size N and {n + l)-braids. Taking this m, we look at what would 
happen were I\a,b\ subdirect. 

Let a < ji <■■■< jm < b he a. list of indices where backward edges 
occur in [a,b). For each k = we choose a backward edge Cj^ G 

apply Lemma [151 We obtain an (n -|- l)-braid consisting of 
solutions So,..., s„+i and indices ii,i 2 , ■ ■ ■, in+i in £ia,b] that uses n -|- 1 of 
our backward edges. Moreover, since Si passes through a backward edge ej 
for some j G [^ 1 ,^ 2 ), we get Si(z 2 ) G Qj. Sice the only condition on Sq is 
so(* 2 ) = ■ 51 (^ 2 ), we can modify sq to ensure so(ii) G without breaking the 
braid. The situation is sketched in Figure [H 

Observation [TTl then gives us that I[a,b] has a solution t such that t{ii) = 
So{ii) G (Fjj and t{in) = Sn(in) = Sn+i(in) (shown by a dashed line in Fig¬ 
ure [1]). 

Now it remains to see that since t(in) = Sn+i(in), there is a path from 
t(in) to some backward edege ej, j > in+i- Therefore, t{in) G \ Ci„ and 
solution t witnesses that there is a path from t{ii) G to ^ Ci^, a 
contradiction with way we have chosen the sets Q. □ 
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Figure 4: A schematic view of the instance (the ellipses are the sets Q, 
backward edges Cj are thick). 

5 Undirected reachability on path instances 

Given a path CSP instance / and i < j, we dehne the digraph Conn(J) of I as 
the graph with vertex set eqnal to the disjoint union of all unary constraints 
Bi,... ,Bn and edge set eqnal to the disjoint nnion of all binary constraints 
of / (restricted to Sjs). Given a path GSP instance / and nnmbers i < j, 
the relation consists of all pairs a & Bi,b ^ Bj snch that there is a path 
from a to 6 in Gonn(/[jj]). 

Lemma 17. If I is a path CSP instance o/GSP(A) and i < j, then Xpij 
lies in the relational clone of A. 

Proof. Let us orient the graph Gonn(G 7 j. so that edges always go from B^ 
to Bk+i. This establishes levels on the graph {Bi is on the hrst level, 
on the second level and so on). 

It is easy to see that for a E B^ and b G Bj we have (a, b) G if and 
only if there is a digraph homomorphism h: P ^ Gonn(/[ij]) where P is an 
oriented path which starts at level 0, ends at level j — i, has no vertex of level 
less than 0 or more than j — z, and h maps the starting point of P to a and 
ending point of P to b. 

Let now the path P witness (a, b) G Xj^ij and the path Q witness (c, d) G 
Xj^ij. By [ini Lemma 2.36], PxQ then contains an oriented path R that goes 
from level 0 to level j — i. By considering projections of PxQ, we obtain that 
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R homomorphically maps to both P and Q and from this it is easy to verify 
that R witnesses both (a, 6), (c, d) G Since there are only hnitely many 

pairs in we can repeat this procednre to hnd a path S that witnesses 

the whole It is then straightforward to translate homomorphisms from 

S to Conn(/[j j]) into a primitive positive dehnition of in A. □ 

Lemma 18. For every relational structure A, every path instance I o/CSP(A), 
and every i < j, we have V\{I) h Xj^ij{i,j). 

Proof. Let us £x i and j. For G {i,i + 1,..., j}, consider the relation 

Pk = {(fl, h) & Bi X Bk'. there is a path from a to 6 in Conn(/) that 

uses only vertices in Bi, -Bj+i ,... ,Bj}. 

We show by induction on k that V\{I) Pkih k) for every k = i,..., j. This 
will be enough, since pj = Xpij. 

The base case k = iis easy: Since pi 3 {{b,b): b E Bi], the rule pi{x, x) E- 
Bi{x) is present in so we get V\{I) \- pi{i^i). 

The induction step: Assume we have V\\- pk{i^k). Given the definition 
of Pk and Pfc+i, it is straightforward to verify that the pair of rules 

Pk+i{x, z) E- pk{x, y) A Bk^k+i{y-, z) 
pk{x, y) E- pk+i{x, z) A Bk,k+i{y, z) 

is consistent with A and therefore present in V\. Applying the first of those 
rules (with x = i, y = k, and z = k + 1) then gives us V\{I) h Pk+i{h k + 1), 
completing the proof. □ 

Let / be a path instance of CSP(A) of length i. In the following, we will 
again be using the sets Ci from Section 01 

Let 1 < A < Z 2 < • • • < 4 < ^ be the complete list of all indices i with a 
backward edge in (i.e. all i such that that Bi n{{Bi\ Ci) X Bi+i) ^ 

0). For convenience, we let A = 0 and 4+i = 

Now consider the new path instance I\ (see Figure [5]) with variable set 
U = {1, A, A + 1 jA, ■ ■ ■, + 1) AVe get R from Ru by hlling out the 

gaps by relations Xj^i.+i^i.^p. For all j such that A + 1 < A+i (i-e. Ru has 
no binary constraint between A + 1 and A+i); 'w® add the binary constraint 
((A + 1, A+i), to R. See Figured 

By Lemma [ITl the constraints of R belong to the relational clone of A. 
Let for each v E U the set C By consist of all values of s(n) where s is 
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Figure 5: The instance I\ with ii = 3,^2 = 6,^3 = 10 (ellipses mark the sets 

a = A). 

a solution of (/a)[i,«]- If is easy to show by induction on v that 
for all V ^ U. In particular, we have that Ix is satishable if and only if / is 
satishable. Moreover, Ix has a backward edge in roughly every other binary 
constraint. Finally, derives Ix from / by Lemma fTSl 
We can sumarize the hndings of this section as follows: 

Lemma 19. Let h he a relational structure and let I he an unsatisfiahle path 
instance o/CSP(A). Then V\ derives from I the unsatisfiahle path instance 
Ix such that any interval of variables of Ix of length at least 2m + 2 contains 
at least m indices with backward edges and the constraints of Ix are invariant 
under all polymorphisms of A. 

6 Symmetric Datalog works for n-permutable 
path instances 

In this section, we put together the results from the previous two sections to 
show that for every A n-permutable there is an M such that {!) \-G for 
every unsatishable path instance I of CSP(A): 

Theorem 20. For each N and n there exists f{n, N) so that whenever 
A is an n-permutahle relational structure and I an unsatisfiahle path instance 
o/CSP(A) such that |A| < N for all i, then V'^'^’^\l) \-G. 
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Proof. We prove the theorem hrst in the case when A contains symbols for 
all binary and unary relations compatible with A, and then how the general 
case follows. 

We £x n and proceed by induction on N. For A = 1, a path instance 
is unsatishable if and only if at least one of is empty, which V\ easily 

detects, so /(n, 1) = 2 works. 

Assume that the theorem is true for all structures and all instances with 
sets Bi smaller than N. Let m be the number from Lemma [16] for our n and 
N = |A|. We let f{n, N) = f{n, A —l)+2m+6 and claim that h G 

for any I G CSP(A) whose unary constraints Bi have at most A elements. For 
brevity, let us denote 2m + 2 by L, so we have /(n. A) = f{n, A — 1) + L + 4. 

Our starting point is the instance A from Section O By the hrst part 
of Lemma [T^ \- Ix and I\ is an unsatishable path CSP instance of 

CSP(A). Consider now what does on A. First of all, if the length of 

A is at most L, then V^'^^{Ix) F G and we are done (by Lemma [TOl we have 
FG), so we can safely assume that A is longer than L. We show 
that V^'^^{Ix) derives another unsatishable instance K that falls within the 
scope of the induction hypothesis. 

It turns out that A contains many backward edges: By Lemma [191 each 
interval of A of length 2m + 2 contains at least m backward edges. We 
can thus use Lemma [16] to show that any interval of A of length L contains 
at least one binary constraint that is not subdirect. These constraints will 
enable us to shrink the unary constraints on A- 

Let I be the length of A- For a < i < j < b we will introduce the 
following two relations (we truncate intervals that are not subsets of [1,£] to 
avoid having to deal with border cases later): 

Sl,[a,b],i = {s(^) : S is a solution of (A)[max(l,a),mm{by)]}> 

Sl,[a,b] = {{s{a),s{b)): S is a solution of (A)[max(l,a),mm{M)]^ 

It is easy to see that these relations lie in the relational clone of A. Obviously, 
F S' 7 -ja_ 6 ](a, b) and F whenever b — a < L (this can 

be done in one step as the program is big enough to simply look at the whole 
of at once). 

We are now ready to show that V^'^^{Ix) F A, where K is an unsatishable 
path instance of CSP (A) whose unary constraints all have at most A — 1 
elements. 

We construct K as follows: Denote by B[ the unary constraint on the 
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Figure 6: Constructing the instance K by looking at solutions of intervals of 
h. Sets shown as ellipses. 

Fth variable of I\. Since sub directness fails somewhere in [1,F], there is 
an index k G [l,L] such that is strictly smaller than B'-^. 

Looking at [k +1, zi + L], we hnd an index k where subdirectness fails again, 
so S' 7 ^jj 2 -i,* 2 +i ],*2 $ Continuing in this manner, we get an increasing 

sequence of indices 1 < k < k < ■ ■ ■ < k k L such that for each j = 1 ,..., A; 
we have < \B[.\ - 1 and ij+i - ij < L. 

We take these indices ij and observe that we have the following derivations 
(see Figure [6l on the first line we use the fact that L is at least 2): 

^ tor all j = 1 ,..., * 

Pa (^a) ^ Ife). 

PL‘(/A)^S4,[4/l.4fe)- 

We take these relations and use them to build up our instance K of CSP(A): 
The instance K has variables k,k, ■ ■ ■ kk and unary constraints (A, 

(for the hrst variable), [ij, for j = 2 ,..., k-l and (4, 

for the last variable. The binary constraints of K are {{ijkj+i)^ 
where j = 1,..., A; — 1. 
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It is easy to see that any solution of K would give us a solution of 
so K is unsatisfiable. Moreover, all unary constraints of K have at most 
N — 1 members and all constraint relations of K belong to the relational 
clone of A. By induction hypothesis, we then have ^G. It now 

remains to use Lemma fTOl twice: We get hrst {Ix) hC, followed 

t'y {Ix) h G. Since we chose f{n, N) to be f{n, N — 1) + L + 4, 

we have the desired result (!) h G. 

It remains to talk about the case when A does not contain symbols for all 
unary and binary compatible relations. Denote by B the relational structure 
we get from A by adding those missing relational symbols. Let / again 
be an instance of CSP(A) with each Bi of size at most N. By the above 
argument, we get G, so there is a derivation of G in 

from the relations of I. Observe now that the instance / only contains 
relations from A and that if we take and delete rules with IDBs that 

are not basic relations of A, we get Therefore, the derivation of 

h G also witnesses that h G and we are done. □ 

By taking M = f{n, |A|), we obtain the following corollary: 

Corollary 21. For each n-permutable relational structure A there exists 
M E N so that whenever I is an unsatisfiable path instance of CSP(A), 
then pf(/)hG. 

7 Prom linear to symmetric Datalog 

It remains to explain how to move from solving path CSP instances to solving 
general CSP instances. This is where we will need linear Datalog. 

Given a relational structure A, we use the idea from [3l Proposition 13] 
and dehne the k-th bubble power of A as the structure A*^^^ with the universe 

and the following basic relations: 

• All unary relations S F that can be dehned by taking a conjunction 
of basic relation of A (we are also allowed to identify variables and 
introduce dummy variables, but not to do existential quantihcation), 
and 

• all binary relations of the form 

Ex = {((ai, ■ ■ ■ ,Ofc), {hi ,... ,6fc)) e {A^Y : V(h j) el, ai = bj} 
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where X C [k]^. 

In this section, we show that if A has pathwidth dnality at most A: — 1, 
then all we need to worry abont are path CSP instances of CSP(A*^^)). Onr 
method is straightforward, but we need to get a bit technical to take care 
of everything; we recommend that the readers try to prove the following 
statement themselves. 

Lemma 22. Let A be a (finite) relational structure, /c G M. Assume that A 
has pathwidth duality k — 1 and let s E N be such that V\(k) (/) P G for each 
unsatisfiable path instance I o/CSP(AP)). Then decides CSF(A). 

Proof. We need to show that P G for every I unsatisfiable instance. 

Since A has pathwidth duality A: —1, it is enough to show that P G 

whenever J = (IP, C) is an unsatishable CSP (A) instance of pathwidth at most 
A; — 1. 

Let Xi,..., Xg he the partition of IP witnessing that J has pathwidth at 
most A: — 1. If W X Xj+i resp. W+i X Xi for some i, then we can delete the 
smaller of the two sets and still have a partition that satishes Dehnition |2l 
Therefore, we can assume that all neighboring sets are incomparable. From 
this, it follows that all sets Xi are pairwise different, because Xi = Xj for 
i < j implies Xi C Xj+i. 

We £x a linear order -< on IP. For each i, we will represent Xi by the 
A;-tuple Xi ^ thcit we get by listing the elements of Xi from -<-minimal to 
-<-maximal, repeating the -<-maximal element if Xi has less than k elements. 
Since the sets Xi are pairwise different, we get pairwise different tuples. 
Recall that J^Xi denotes the sub instance of J induced by W- 

We now construct an unsatishable path instance K of CSP(A^^)). The 
variable set of K is {xi, ■ ■ ■, Xi\- The constraints are as follows: 

• For each z, the z-th unary constraint relation Bi lists all solutions of 
J\Xi- More formally, we let 

Bi = {p o Xi'. p G is a solution of J^Xi} X A^. 

It is straightforward to verify that Bi is a basic relation of A^^\ 

• For each i = 1,2,..., £—1, we encode the intersection of Xi and W+i by 

adding the constraint = Ex where X = {(a, 6): Xi(a) = Xi+i{b)}. 
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If s is a solution of K, we can construct a solution t of J as follows: For 
each V G V, find an i G [1] and j G [k] such that XiU) = cind let t{v) be the 
j-th coordinate of s(xj). It is an easy exercise to verify that the t we obtain 
would be a solution of J. Since J is unsatishable, so is K. 

Since iF is a path instance, we get {K) h G. Now extend the set 
of variables of K to the whole without adding any new constraints. 
While this new instance K' is no longer a path instance, it is still true that 
V\(k){K') h G (the derivation of G can just ignore the new variables). 

We can now use Lemma [m The structure B in the Lemma will be 
and the relations Si,, Sm will be Bi, B 2 ,..., B^ and - 82 , 3 ,..., 

It is straightforward to show that V^^{J) derives the instance K: Each of the 
statements J) h and V^^{J)\- Bi, ,i+i{Xi^Xi) (where i ranges over 

[1] and [^ — 1], respectively) has a derivation of length one. Lemma ITT] then 
gives us that F G, concluding the proof. □ 

Observe that if the algebra of polymorphisms of A is n-permutable, then 
so is the algebra of polymorphisms of A*^^h We are hnally ready to prove our 
main result: 

Theorem (Theorem | 6 ]). Let A be a relational structure such that there is 
a linear Datalog program that decides CSP(A) and A admits a chain of n 
Hagemann-Mitschke terms as polymorphisms. Then there exists a number 
M so that decides CSP(A). 

Proof. Since there is a linear Datalog program that decides CSP(A), there is 
a A: G N so that A has pathwidth duality at most k. 

Since all relations of the bubble power A^^l are compatible with the 
Hagemann-Mitschke terms of A, Corollary |21] gives us that there is an integer 
M' such that the program derives the goal predicate on any unsatish¬ 
able path instance of CSP(A^*^)). Therefore, Lemma 1221 gives us that pF+ 2 )M 
decides CSP(A). □ 

8 Conclusions 

In Theorem [ 6 ], we gave a characterization of the class of CSPs solvable by 
symmetric Datalog programs. It is not the best possible characterization, 
though, because at the moment it is not known for which structures A is 
there linear Datalog program deciding CSP(A). 
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However, once somebody obtains a characterization of linear Datalog, our 
result immediately gives a characterization of symmetric Datalog. To see how 
that could come about, let us reexamine some conjectures about the CSPs 
solvable by fragments of Datalog [15] that would give us a characterization 
of symmetric Datalog: 

Conjecture 23 (B. Larose, P. Tesson). Let A be a finite relational struc¬ 
ture such that the algebra of polymorphisms of A generates a congruence 
semidistributive variety. Then there is a linear Datalog program that decides 
CSP(A). 

An alternative way to settle the complexity of CSPs solvable by symmetric 
Datalog would be to replace “linear Datalog” in TheoremElby just “Datalog”. 
In particular, if the following were true, we would get a characterization of 
symmetric Datalog, too: 

Conjecture 24 (B. Larose, P. Tesson). Let A be a relational structure such 
that the algebra of polymorphisms A of A is idempotent and omits all the 
tame congruence theory types except type 3. Then CSP(A) is solvable by 
linear Datalog. 

If Conjecture [23] or |2l| is true, then the following are quivalent for any 
core relational structure A: 

(a) A is n-permutable for some n and CSP(A) is solvable by Datalog. 

(b) The idempotent reduct of A admits only the tame congruence theory 
type 3. 

(c) There exists a symmetric Datalog program that decides CSP(A). 

Here the implication ^ ^ (jcj) (or (0 ^ (jcj)) is the unknown one. Im¬ 
plication (eD ^ (0 follows from [151 Theorem 4.2.], while [HI Theorem 9.15] 
together with the characterization of problems solvable by Datalog [1] gives 
us ^ dS]). 

We end with another citation of [T5] whose consequences we hnd tanta¬ 
lizing: Assume that L 7 ^ NL and L 7 ^ ModpL for any p prime. Then we can 
add a fourth statement to the above list: 

(d) CSP(A) is in L modulo hrst order reductions. 


From one side, symmetric Datalog programs can be evaluated in logspace, 
while from the other side [151 Theorem 4.1] shows that if of A contains tame 
congruence types other than 3, then there is a first order reduction of CSP(A) 
to a problem that is NL-hard or ModpL-hard for some p. 
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